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Preface 


Closed-loop  guidance  laws  resulting  from  differential 
game  models  are  seldom  realized.  Several  approximations  to 
the  closed-loop  law,  based  upon  updating  a reference  open-loop 
trajectory,  have  been  postulated.  This  thesis  represents  the 
results  of  my  attempt  to  apply  a differential  dynamic  program- 
ming scheme  with  a new  convergence  control  parameter  technique 
to  an  air-to-air  missile  intercept  problem  using  nonlinear 
dynamics. 
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Abstract 

An  intercept  problem  between  an  air-to-air  missile 
and  an  aircraft  is  modeled  as  a zero  sum,  free  final  time 
differential  game  which  includes  nonlinear  dynamics  and  a 
payoff  related  to  the  kill  probability.  Previous  research 
has  shown  that  the  currently  used  guidance  scheme, 
proportional  navigation,  is  nonoptimal  in  this  type  of 
problem  formulation  and  a higher  kill  probability  is  possible 
with  a guidance  law  based  upon  a differential  game  theory. 

A differential  dynamic  programming  method  is  applied 
to  the  intercept  problem  in  the  search  for  a real-time  feedback 
solution.  A convergence  control  procedure  is  introduced 
in  an  attempt  to  enhance  the  convergence  of  the  typically 
long-time  solution  methods.  The  closed-loop  guidance  law 
which  results  is  compared  to  both  proportional  navigation 
and  some  exact  open-loop  solutions  by  means  of  an  off-line 
simulation  on  a CDC  6600  computer. 

The  method  does  not  yield  a real-time  solution  for  this 
problem  and  does  not  give  improvement  over  a proportional 
navigation  scheme. 
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APPLICATION  OF  DIFFERENTIAL  DYNAMIC  PROGRAMMING 
TO  AN  AIR-TO-AIR  MISSILE  GUIDANCE  PROBLEM 
MODELED  AS  A DIFFERENTIAL  GAME 

I.  Introduction 


Background 

Proportional  navigation,  whereby  a pursuer  is  guided 
toward  a target  at  a rate  proportional  to  the  measured 
rate  of  rotation  of  the  pursuer- target  line-of-sight , is  the 
principal  guidance  law  currently  in  use  with  most  air-to-air 
missiles  (Ref  10).  It  has  been  shov/n  that  proportional 
navigation  is  optimum  for  problems  using  linear  dynamics 
and  non-maneuvering  targets  (Ref  5*’28?-288).  Several 
attempts  (Refs  1 , 2,  3»  4)  have  been  made  to  devise 
closed-loop  optimal  control  laws  using  nonlinear  dynamics, 
which  offer  an  alternative  to  proportional  navigation  if 
formulated  in  a closed-loop  feedback  strategy.  One  example 
(Ref  4)  requires  that  the  evader's  future  control  strategy 
be  known,  but  does  not  allow  the  evader  to  take  advantage 
of  the  pursuer's  limitations  in  predicting  the  controls. 

The  theory  of  differential  games  (Ref  6)  provides  a 
more  realistic  modeling  of  the  pursuit-evasion  problem. 

The  evader's  natural  desire  to  escape,  and  the  ability 
to  convert  poor  pursuer  strategy  into  an  advantage  for  the 
evader,  can  be  included  in  the  guidance  philosophy. 
Correspondingly,  any  nonoptimal  play  by  the  evader  would 
result  in  a more  favorable  condition  for  the  pursuer. 
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Optimal  open-loop  controls  can  be  found  for  the 
problem  through  the  solution  of  a two  point  boundary  value 
problem  which  arises  from  the  application  of  optimization 
conditions  (Ref  5:  212-246).  Since  these  controls  are 
open-loop,  they  do  not  allow  the  combatants  to  capitalize 
on  each  other's  errors.  Near  optimal  feedback  strategies 
based  upon  a linearization  about  the  nominal  trajectory 
(resulting  from  the  open-loop  controls)  which  is  periodically 
updated  have  been  proposed  (Ref  1,  2,  3).  They  provide 
some  real-time,  near  optimal  controls;  however,  the  nominal 
saddle-point  solution  is  required  for  the  linearization  and 
the  updating  must  be  accomplished  often  enough  to  keep  the 
assumed  linearization  valid.  This  represents  an  enormous 
investment  in  computational  time  and  storage  space  when 
applied  to  problems  which  include  nonlinear  dynamics  and 
realistic  maneuvers. 

A comparison  between  proportional  navigation  and 
differential  game  guidance  (Ref  11)  where  nonlinear  dynamics 
and  target  maneuverability  are  allowed,  conclusively  proves 
that  proportional  navigation  is  not  optimal.  An  off-line 
computer  simulation  (Ref  11:  94-100)  was  used  to  solve  the 
problem  but  a real-time  application  was  not  realized.  The 
potential  gains  involved  make  the  search  for  a real-time 
implementation  worthwhile. 

Statement  of  the  Problem 

An  intercept  problem  between  a heat  seeking,  air-to-air 
missile  and  an  aircraft  (Ref  11),  modeled  as  a z6ro-sum, 
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free  final  time,  differential  game  between  tv/o  intelligent 
combatants,  forms  the  model  for  this  thesis.  A differential 
dynamic  programming  algorithm  (Ref  8)  is  used  to  obtain  a 
closed-loop  solution  for  the  problem.  The  aim  is  to 
test  this  algorithm  for  the  possibility  of  obtaining  a 
real-time  guidance  law  to  be  used  on  a short  duration 
(typically  less  than  six  seconds)  air  intercept  problem 
by  periodically  updating  a computed  control  history. 

Each  combatant  can  change  his  control  at  updating  points 
to  capitalize  on  deficiencies  in  the  adversary's  strategy. 

A convergence  control  procedure  (Ref  8)  was  included 
in  an  attempt  to  accommodate  convergence  problems  associated 
with  the  inclusion  of  nonlinear  dynamics  and  to  aid  in 
keeping  the  computational  time  to  a minimum,  while  not 
significantly  reducing  the  accuracy  of  the  final  solution. 
The  major  emphasis  of  this  thesis  is  to  seek  a real-time 
implementation  of  the  differential  game  feedback  guidance 
law  to  the  nonlinear  model. 

Overview 

Chapter  II  discusses  the  mathematical  aspects  of 
differential  games.  The  dynamic  programming  algorithm 
used  in  obtaining  the  closed-loop  control  strategies  is 
explained  in  Chapter  III,  while  the  game  scenario  is 
presented  in  Chapter  IV.  The  results  obtained  in  the 
application  of  this  algorithm,  and  those  resulting  from 
an  application  of  proportional  navigation  to  the  missile- 
aircraft  intercept  problem  are  compared  in  Chapter  V. 


II.  Differential  Game  Theory 


Mathematical  Formulation 

The  zero-sum  differential  game  may  consist  of  the  state 
equations,  some  path  or  terminal  constraints,  a terminal 
(stopping)  condition  which  determines  when  the  game  ends, 
and  a payoff  or  cost  function.  The  state  equations  which 
describe  the  motion  of  the  two  players  are  represented  as 

X = -F(x,ujvt)  • x(t0K  X0  <2-1) 

where  x is  an  n-dimensional  vector  which  represents  the 
state  of  each  combatant,  u is  the  vector  control  of  the 
pursuer  (minimizer),  and  v is  the  vector  control  of  the 
evader  (maximizer).  Constraints  may  be  imposed  upon  the 
controls  of  the  form 


C(x„u.)  i o 

C(M  i o 


(2-2) 


where  Xp  and  xg  represent  the  pursuer  and  evader  components 
of  the  state  vector.  In  addition,  terminal  constraints  of 
the  form 

r[x(tf)  , ] r 0 (2-3) 


may  be  included.  For  situations  in  which  the  final  time 
is  left  free  and  no  terminal  constraints  are  imposed,  some 
stopping  condition  must  be  specified,  for  example  ^ (J)=0. 
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The  cost  function  is  expressed  in  General  as 

r*f 

J = + J (2-4) 

\ 

The  cost  is  a numerical  measure  for  determining  the  outcome 
of  the  game  and  for  evaluating  the  effectiveness  of  a 
particular  selected  strategy.  The  game  is  termed  zero-sum 
because  there  is  a single  payoff  and  one  player's  gain  is 
the  other  player's  loss.  The  pursuer's  goal  is  to  minimize 
the  cost,  J,  while  the  evader  strives  to  maximize  it.  This 
forms  the  basis  upon  which  each  player  selects  his  controls. 

The  objective  of  the  game  is  to  determine  optimal  control 
strategies,  u*  and  v# , such  that 

J(u*v)  6 J(u*v*)  - J*(u,v*)  (2-5) 

If  the  pair  u*  and  v*  can  be  found,  it  is  termed  a saddle 
point  of  the  game. 

Necessary  Conditions  for  a Solution 

The  problem  under  consideration  in  this  thesis  is  a 
free  final  time  differential  game  without  terminal  constraints. 
A necessary  condition  for  the  saddle  point  solution  is  that 
the  Hamiltonian,  H,  defined  as 

H(x, A>u.%v>t ) = ATf  ♦ L (2-6) 

be  maximized  for  admissible  values  of  v,  and  minimized  for 
admissible  values  of  u.  For  games  in  which  the  Hamiltonian 


is  separable  in  u and  v,  where 


H - Hc(x,v)  + Hp  (*,u.)  (2-7) 


the  following  necessary  conditions  apply  (Ref  5): 


in 

bx 
o 


(2-8) 


These  conditions  hold  if  there  are  no  control  constraints. 
For  the  case  where  control  constraints  are  imposed,  the 
following  conditions  apply  (Ref  5): 

dx 


where  X.  represents  the  n-dimensional  co-state  vector  and  "0 
is  the  Lagrange  multiplier  vector  which  obeys  the 
following  (Ref  5s  1 08-1 09) : 


( 


= O FOR  C < 0 

# O FOR  C - O 


(2-10) 


The  Two  Point  Boundary  Value  Problem 

The  application  of  the  necessary  conditions,  Eqs  (2-8) 
or  (2-9),  result  in  expressions  for  the  saddle-point 
controls,  u*  and  v*.  These  controls,  u*(x,  X , t)  and 
v*(x,  X , t),  are  substituted  into  the  state  and  co-state 
equations  to  form  a two  point  boundary  value  problem 
(TPBVP)  of  the  form: 

x=f(x,A,t)  • 

Arj(x,A,t)  • A(tf)--|Sj  <2-"> 

W&1  : -If I f 

f it  'tf 


The  solution  to  the  TPBVP  yields  open-loop  controls  of  the 
form 


U.(t)  = u(x0lA0lt) 

v(t)  = v(x.,A.,t) 


(2-12) 


These  controls  are  termed  open-loop  because  they  depend 
only  upon  the  initial  conditions,  the  time,  and  the 
assumption  that  each  player  will  employ  the  optimal 
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strategy.  They  do  not  provide  a means  for  either  combatant 
to  capitalize  upon  nonoptimal  play  by  the  adversary.  One 
method  for  determining  control  strategies  which  are  able 
to  adapt  to  variations  in  the  opponent's  strategy  requires  that 
the  solution  to  the  TPBVP  must  somehow  be  periodically 
updated  based  upon  more  current  information.  This 
philosophy  is  based  upon  the  fact  that  optimal  closed-loop 
and  open-loop  controls  have  the  same  time  history  and  state 
trajectories.  This  idea  forms  the  basis  for  the  determination 
of  closed-loop  controls  which  are  able  to  transfer  one 
player's  nonoptimal  strategy  into  an  advantage  for  the 
other. 


III.  Closed-Loop  Control  Strategy 


The  traditional  open-loop  solution  to  the  TPBVP,  Eqs 
(2-10),  requires  that  initial  values  of  the  co-states,  A (0), 
be  known  and  utilized  in  a forward  integration.  These  values 
are  difficult  to  arrive  at,  and  they  must  be  reasonably 
close  to  the  optimum  co-states  to  hope  for  obtaining  a 
solution  to  the  TPBVP  (the  open-loop  controls).  Further 
complications  arise  because  the  TPBVP  is  extremely  sensitive 

to  even  small  variations  in  the  initial  co-state  values. 

) 

Large  trajectory  deviations  may  result  and  convergence  may 
be  inhibited  (even  precluded)  as  indicated  in  Fig.  1 (Ref  9). 
Without  an  accurate  TPVBP  solution,  closed-loop  controls 
based  upon  the  updating  of  open-loop  controls  is  impossible. 


Fig.  1.  Co-state  Sensitivity 

The  Differential  Dynamic  Programming  method  (DDP)  is 
an  alternative  method  for  obtaining  closed-loop  controls. 


i 


The  DPP  Method 

The  DDP  method  (Ref  7,  8)  attempts  to  solve  the 
following  problem: 

0 - o 

o t 

H = L(x,u,vlt)+  jff(x,u,v,t) 

^X.i)  = 


(3-D 


(3-2) 


ar 


( 


where  the  subscript  indicates  optimality  and 
The  solution  to  Eq  (3-1)  yields  an  optimal  cost),  J°(x,t), 
which,  when  substituted  into  Eqs  (3-2),  results  in  optimal 
closed-loop  controls  u*  and  v*.  Unfortunately,  Eq  (3-1)  does 
not,  in  general,  readily  lend  itself  to  analytical  solutions. 
The  DDP  method  provides  a numerical  tool  for  obtaining  the 
solution  in  an  iterative  manner. 

If  Eq  (3-1)  is  expanded  to  first  order  about  the 
optimal  trajectory,  x* , the  following  relationships  emerge: 

-i  = o 

"*»  (3-3) 

where  u and  v are  nominal  controls  which  result  in  a 
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nominal  trajectory  x,  and  a(t)  is  the  difference  between 
the  optimal  cost,  J°,  resulting  from  the  application  of 
the  optimal  controls  u*  and  v#  in  the  state  equations,  and 
the  cost,  J,  obtained  from  using  nominal  controls  u and  v. 

ot(t)  r .J*(u.*V*)  - j(u  v)  (3-4) 

The  predicted  cost  change,  a(tQ),  can  be  expressed  as 

+ aeCt0)  (3-5) 

where  ap(tQ)  is  the  predicted  cost  change  due  to  changes  in 
the  pursuer’s  control  and  a (t  ) is  that  due  to  changes  in 
the  evader's  control.  These  relationships,  Eqs  (3-3) » are 
valid  if  = X*(t)  -x(i)  is  not  excessively  large 

within  the  time  interval  remainingj due  to  the  linearization. 

The  derivation  of  Eqs  (3-3)  is  presented  in  Appendix  C. 

The  mechanization  of  the  DDP  algorithm  for  obtaining 
optimal  closed-loop  controls  is  as  follows: 

(a)  Nominal  controls,  u and  v,  are  used  in  the 
state  equations,  Eq  (2-1),  which  is  then  integrated  forward 
in  time  until  reaching  the  stopping  criterion,  to  determine 
a nominal  trajectory  x(t)  and  a cost  J(t)  from  Eq  (2-3). 

(b)  Eqs  (3-3)  are  integrated  backward  in  time, 
using  the  same  nominal  controls,  u and  v,  with  appropriate 
boundary  conditions.  At  each  step  of  the  backward  integration, 
theconditions  of  Eqs(3-2)  are  enforced  to  obtain  new  controls 
u*(t)  and  v*(t)  which  are  stored  in  the  computer. 

(c)  The  new  controls,  u*  and  v*,  are  applied  to 
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Eqs  (2-1)  and  (2-3)  in  a forward  integration  as  in  step  (a). 
If  the  change  in  cost,  4A.X  = T (u*v*0-  J 
is  of  the  same  order  as  a(tQ),  u(t)  and  v(t)  can  be 
replaced  by  u*(t)  and  v*(t).  Steps  (a)  through  (c)  are 
repeated  until  a(t  ),  a (t  ) and  a (t  ) are  small. 

C U ^ v 

(d)  '.Vnen  the  predicted  cost  changes  have  been 
decreased  to  a prescribed  small  value,  the  computed  controls 
are  used  as  the  optimal  controls  for  a specified  period  of 
time  (a  fixed  portion  of  the  trajectory). 

(e)  At  the  end  of  the  specified  time  interval, 
the  entire  sequence  is  begun  again  using  the  conditions 
at  the  end  of  the  interval  as  the  new  initial  conditions. 

If  the  actual  cost  change,  A J,  is  not  on  the  order  of 
the  predicted  cost  change,  a(tQ),  some  form  of  convergence 
control  must  be  supplied  to  assure  that  a solution  will  be 
found.  The  method  used  in  this  problem  is  the  Convergence 
Control  Parameter  method  (CCP)  (Ref  8). 


( 


The  CCP  Method 

The  magnitude  of  A x(t)  can  be  restricted  if  the 
control  changes  between  iterations,  A u(t)  and  Av(t), 
are  not  excessively  large.  The  idea  behind  CCP  is  to  restrict 
the  magnitude  of  the  control  changes  by  means  of  convergence 
parameters  attached  to  A u(t)  and  Av(t). 

An  augmented  Hamiltonian,  H,  is  formed  as  follows: 

H*h(x,u  au^av.T.^P.j*) 

+YAU.TPfAk  AVTPf  AV 


(3-6) 
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( 


where  P and  P are  the  convergence  parameters.  These  are 
p e 

diagonal  matrices  whose  elements  are  positive  values.  A 
saddle-point  to  H is  sought  through  the  DDP  method. 

The  predicted  cost  change,  a(tQ),  written  as 


0.(0  = 


allows  ap(tQ)  to  be  dependent  upon  Pp  and  ae(tQ)  upon  Pg. 

An  analysis  of  the  relative  magnitudes  of  the  predicted  and 
actual  cost  changes  allows  the  penalty  terms  to  be  adjusted 
individually  for  the  best  convergence  characteristics  as 
explained  in  Appendix  D. 

If  the  predicted  cost  change,  a(tQ),  is  approximately 
equal  to  the  actual  cost  change,  A J,  the  series  expansion 
of  Eq  (3-1)  is  satisfied.  The  relationship  between  4 J 
and  a(tQ)  can  be  plotted  and  divided  into  several  regions 
to  indicate  the  effectiveness  of  the  selected  penalty 
values  in  effecting  good  convergence  of  the  solution. 

This  is  explained  in  detail  in  Appendix  D. 
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IV.  Intercept  Problem 


The  specific  problem  under  consideration  is  a 
pursuit-evader  situation  between  an  air-to-air  missile  and 
an  aircraft.  This  problem  is  modeled  as  a zero-sum 
differential  game  with  free  final  time. 

Underlying  Concepts 

The  aircraft  is  modeled  with  the  stall  limit,  thrust, 
and  drag  dependent  upon  altitude  and  velocity.  The  missile 
is  a thrust-coast  air-to-air  missile  utilizing  an  infra-red 
seeker.  Missile  guidance  is  begun  at  the  termination  of 
the  boost  phase  with  drag  dependent  upon  altitude  and  velocity. 

The  game  "ground  rules"  are  as  follows: 

(a)  Each  vehicle  is  represented  as  a point  mass, 
maneuverable  in  three  dimensions. 

(b)  All  maneuvers  performed  are  flown  in  a 
co-ordinated  fashion. 

(c)  Gravity  is  represented  as  a constant  in  both 
magnitude  and  direction. 

(d)  Both  combatants  are  presumed  to  have  perfect 
knowledge  of  the  state  of  the  game  at  all  times. 

(e)  With  free  final  time  and  no  terminal  conditions 

to  meet,  the  determination  of  when  the  game  ends  is  the 
point  where  ^ = 0. 

Vehicle  Models 

Aircraft.  The  aircraft  model  is  based  upon  the  F4. 

The  stall  limit  and  thrust  variations  with  velocity  and 
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altitude  are  represented  as  polynomials  with  the  maximum 
throttle  setting  used  throughout.  The  equations  of 
motion  are 


X = 


* 

* 

V 


vcoatf  cos  <r 
v cos  y 5ir\  r 
V Sin  y 

[tcos*  -O]^-  ' JJ.nY 


i r [ L ♦ T.,ni]  CJL2£  - 1 

i-  =[t  + Ti.nJ 

L J mv  CoS  ^ 


Cos  V 


(4-1 ) 


where  the  variables  are  defined  as  follows: 

X = distance  in  the  north  direction 
y = distance  in  the  west  direction 
y = altitude 

V = magnitude  of  the  velocity 
% = angle  between  the  velocity  vector  and  the 
local  horizon 

C*  = angle  between  the  projection  of  the  velocity 
vector  in  the  x-y  plane  and  the  x-axis 
•<  = angle  of  attack  (defined  as  the  angle  between 

the  thrust  and  velocity  vectors) 

- bank  angle 
0 = force  due  to  drag 

L = lift  (perpendicular  to  velocity  vector) 
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= acceleration  due  to  gravity 
= thrust  along  the  aircraft  centerline 


Fig.  2.  State  Variable  Depiction 


The  controls  are  the  bank  angle  and  the  load  factor. 

No  constraints  are  imposed  upon  the  bank  angle;  however, 
the  load  factor  is  constrained  aerodynamically  as  a 
function  of  altitude  and  airspeed,  and  limited  structurally 
to  six  g's.  Expressed  mathematically,  the  constraints  are 
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written  as  follows: 


C,(M.V)  4 0 <W> 

C,(n)  > 0 

where  C1  is  tne  aerodynamic  constraint,  C 2 is  the  structural 
limit,  and  n is  the  load  factor. 

The  force  due  to  drag  is 

O = C0  Q 5 <4-J> 

where  the  following  relationships  define  the  variables: 


c = C + k c 


I u 


^ 1 a ^ \/Z 

0 " T e v 


S = reference  area 


(4-4) 


Numerical  values  for  the  models  are  listed  in  Appendix  E. 

Missile.  The  same  equations  of  motion,  Eqs  (4-1 )» 
are  used  in  the  missile  model;  however,  the  missile  is 
considered  in  the  coast  phase  only  so  that  the  thrust,  T, 
is  zero.  No  aerodynamic  constraint  is  imposed  upon  the 
missile  load  factor.  The  structural  limit  is  set  at 
fifteen  g’s  with  the  corresponding  constraint  relation  as 
follows: 


( 


C3(t0  > 0 


(4-5) 


Selection  of  the  Cost  Function 

The  selection  of  a suitable  cost  function  is  probably 
the  most  subjective  portion  of  the  differential  game 
problem  formulation.  The  cost  function  used  in  this 
problem  is 

J - AR*  - Bcos(y)  + Cif  (4-6) 

where  R represents  the  range  between  the  aircraft  and  the 
missile,  ^ represents  the  angle  between  the  velocity 
vectors  of  the  two  players  (track  crossing  angle),  tf  is 
the  final  time  (time  at  which  ^ (J)  = 0),  and  A,  B,  and 
C are  suitably  selected  weighting  factors. 

The  selection  of  the  "R  " term  is  based  upon  the  fact 
that  the  probability  of  kill,  P^,  is  primarily  determined  by 
the  miss  distance  in  an  inversely  proportional  manner  (Ref  9) 
The  "R  " term  heavily  penalizes  the  pursuer  for  failing 
to  close  to  within  a small  final  range,  resulting  in  the 
miss  distance  being  the  predominant  measure  of  warhead 
effectiveness. 

Some  consideration  must  be  given  to  the  fuzing  and 
explosive  pattern  of  the  warhead.  The  "destructive  fragments 
radiate  outward  from  the  explosion.  Clearly,  a proximity 
fuze  will  be  most  apt  to  detonate  the  warhead  within  the 
lethal  range  if  the  flight  paths  of  the  missile  and  aircraft 
are  closely  aligned.  The  penalty  associated  with  the 
track  crossing  angle  (-B  cos  J ) reflects  this  consideration 
A small  value  of  ^ results  in  a small  penalty  for  the 


pursuer  while  a head-on  attack  results  in  the  maximum 
penalty,  reflecting  the  difficulties  in  fuzing  the  warhead 
for  very  large  closure  rates. 

The  penalty  attached  to  the  final  time  is  added  to 
preclude  non-unique  solutions  when  the  missile  is  able  to 
accomplish  the  intercept.  This  term  is  not  significant 
unless  the  missile  is  able  to  reduce  the  final  cost  to  a 
very  small  number. 

The  weighting  factors  are  picked  so  that  the  range 
predominates  in  the  cost  function  until  a distance  of  ten 
feet  is  reached.  The  track  crossing  angle  is  significant 
within  the  ten  foot  range  while  the  "t^"  term  becomes 
significant  as  J and  R approach  zero.  The  following 
weighting  values  are  selected  to  reflect  this: 

A = i 

B = i o 

c = % 

Application  of  Differential  Game  Theory 

The  DDP  Equations,  Eqs  (3-3),  are  applied  to  the  problem 
which  allow  the  adjoint  equations  to  be  found  as  partial 
derivatives  of  the  Hamiltonain.  For  the  problem  under 
consideration,  the  Hamiltonian  is 

j AVT  Pcav  + j 4a  ^ 6U.  (4-7) 

t 
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The  particular  formulation  of  this  problem  allows  the 

• T * 

Equation  for  A from  Chapter  II  and  the  Jx  equation  from 
Chapter  III  to  be  related  as 

•r  -T  i 5 (4-8) 


This  allows  Eq  (4-7)  to  be  written  as 

H = Akx  + A}<j.  t A,  \ + Avv 

* -l4VrP  6V  (4-9) 

+ ~ aut  Pp  an 

Employing  Eqs  (4-1)  results  in  the  following: 
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K cos< 

~ I wv 


- T SiA«c 


+ V s,2±l 

mv  cosyj 


r 


x'v  r cos*  cose-  -A^cosssmo-  - A?  s/«  V 

. \b. co^  + v »gA  1 x 

| vn  v vnv  Cos  y J 

[iL+  il  s.n*  + Tcos«c  to] 
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(4-10) 
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L = nU 


T*«  = *5  + D * F>  V 

D * j ft  iPr+ s (c»o 
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(4-11) 
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and  the  martial  derivatives  are 
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Adjustments  to  Eqs  (4-9)  must  be  made  to  compensate 
for  the  load  factor  constraints,  Eqs  (4-2)  and  Eqs  (4-5). 
These  equations  may  be  written  as 

- "'W  ' * 

Ct(")  = 4 ~ n (4-1 J) 


t 


n 


where  n'  is  the  aerodynamic  (or  lift  coefficient)  limit  upon 
the  load  factor  of  the  aircraft 


«V)  = a,*(n-s})(v*a0 


(4-14) 


For  the  occasions  when  the  load  factor  is  on  the  constraint 
boundary,  the  changes  to  Eqs  ( 4— 10)  are 


- - 4h  -■«)  it. 

» ' »* 

— ^Si  — ^ 
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> - r ±M 

a*v 


(4-15) 


This  gives  the  following  relationships  for  constrained 
load  factors 

: _ _ ili  _ iH  K, 

V *3  in  (4-16) 

X : H iC, 

v dn  dv 

and  for  unconstrained  situations  (where  'j  = 0)  the  stated 


relations  hold: 


Optimal  Control  Solution 

A requirement  for  an  optimal  control  is  that  the 

A/ 

augmented  Hamiltonian,  H,  be  minimized  for  the  pursuer 
and  maximized  for  the  evader.  The  augmented  Hamiltonian 
can  be  written  in  separate  form  as 

H - He  + Hp 

H s He  - P«  - P*2  + (4-17) 

+ P,  tlL 

• 2.  * Z 

The  first  order  condition  for  the  evader's  unconstrained 
optimal  bank  angle  is 


= O 


(4-18) 


where 


h.  = C(,(A4W] 


(4-19) 
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The  application  of  Eq  (4-18)  with  appropriate  small 
angle  approximations  results  in 

r c°^«l 

* Tos? — 

mV  L J 


= 


Co  5 V 


(4-20) 


The  second  order  condition 


< o 


(4-21) 


gives 


] -Pe-  (4'22> 


The  selection  of  Pg  must  ensure  that  Eq  (4-22)  is 
satisfied. 

A similar  approach  results  in  the  following  relationships 
for  computing 
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The  constrained  load  factor  is  found  with  an  analogous 
approach.  For  the  evader,  the  separated  terms  of  the 
augmented  Hamiltonian  are 


(4-25) 


The  application  of  the  first  order  necessary  conditions 
gives 


jH, 
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where  the  partial  derivatives  are  evaluated  from  Eqs  (4-11) 


Solving  for  4%  and  assuming  small  angles,  •<  , 
results  in 
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The  second  order  conditions,  Eq  (4-21),  are  applied  to 


_K.\ TkNt  + cf< 

"»  Qs  *• 


The  same  approach  is  used  to  find 
following  results: 


with  the 


(4-27) 


give 


(4-28) 


(4-29) 


(4-30) 


Equation  (4-28)  is  automatically  satisfied  if  Xy,  > O 
and  an  interior  load  factor  is  possible  in  these  circumstances. 


Should  the  computed  load  factor,  + exceed  the 

constraints,  Eqs  (4-13),  the  load  factor  is  set  at 
maximum.  An  interior  control  for  the  pursuer's  load  factor 
is  possible  if  Xv^  < 0 , from  Eq  (4-30),  and  the  load 
factor,  , is  handled  in  exactly  the  same  manner  as 

for  the  evader  in  consideration  of  the  pursuer's  constraint. 
A selection  of  zero  values  for  the  penalty  functions 
reduces  the  problem  to  the  general  differential  game 
situation  (Ref  11:  19-24). 

Proportional  Navigation  Guidance 

A proportional  navigation  scheme  was  employed  against 
the  DDP  "guided"  evader  to  rate  the  performance  of 
differential  game  guidance.  The  proportional  navigation 
pursuer  was  also  allowed  to  use  perfect  information  to 
determine  the  rate  of  change  of  the  line-of-sight  between 
the  two  vehicles.  Two  angles  were  used  to  determine  the 
line-of-sight  (LOS)  as  indicated  in  Fig  3 (Ref  11:25). 
Relationships  for  the  angles  are 
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Fig.  3.  Line-of-Sight  Angle  Determination 


The  time  derivatives  are 
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Solving  yields 
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If  the  calculated  value  of  n exceeded  the  structural  limit, 
n was  set  to  fifteen. 

To  contend  with  situations  where  ® is  small,  Eq  (4-34) 

gives 
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V.  Results 


The  results  obtained  are  derived  from  the  closed-loop 
application  of  the  DDP  algorithm  as  explained  in  Appendix  B 
and  Chapter  III,  and  from  the  proportional  navigation  scheme 
of  Chapter  IV  against  an  evader  employing  the  DDP  algorithm 
in  a closed-loo?  fashion.  Several  TPBVP  solutions  were 
available  (Ref  11)  and  these  allowed  the  algorithm  to  be 
compared  to  a known  reference  as  a check  of  accuracy. 

DDP  Closed-Loon  Application 

The  intercept  problem  was  attempted  first  without 
convergence  control.  It  was  found  that  large  control 
changes,  &u(t)  and  Av(t),  resulted  causing  trajectory 
changes,  Ax(t),  to  be  too  large,  thus  violating  the 
linearization  of  Eq  (3-1)  and  not  allowing  a solution 
to  be  found.  Reducing  the  problem  to  extremely  favorable 
initial  pursuer  positions  did  not  help  the  situation,  and 
the  CCP  technique  was  included  in  the  algorithm. 

The  closed-loop  guidance  scheme  used  in  this  problem 
solution  utilizes  five  iterations  to  determine  the  control 
strategies  and  the  combatants’  applied  controls  are  updated 
at  .5  second  intervals.  The  integration  routine  uses  a dt 
of  .02  seconds.  This  formulation  does  not  result  in  a 
real-time  guidance  philosophy  with  the  use  of  current 
computer  technology  and  is  certainly  not  applicable  to  a 
small,  air-to-air  missile;  however,  it  does  obviate  the 
traditional,  long  execution  time,  non-linear  TPBVP 
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solution  methods  (Ref  11).  The  resulting  trajectories  appear 
M to  follow  reasonable  logic.  The  aircraft  tries  to  cross 

the  path  of  the  missile  and  normally  descends  to  gain  a 
better  turning  rate  and  take  advantage  of  the  higher  thrust 
available.  This  result  is  consistent  with  normal  air-to-air 
evasion  tactics  and  parallels  the  findings  of  reference  11. 

For  purposes  of  this  study,  a kill  is  defined  as  a pass 
within  ten  feet  of  the  aircraft  at  any  value  of  track 
crossing  angle,  ^ , or  a pass  within  fifteen  feet  at 

values  below  45°. 

Case  J_.  The  first  situation  considers  an  attack  from 
directly  behind  the  aircraft  (six  o'clock  position).  The 
altitude  for  both  combatants  is  33*000  feet  and  the  initial 
controls  for  both  are  zero  bank  angle  and  one  "g"  (straight 
and  level).  The  DDP  algorithm  is  applied  in  a closed- loop 
manner  as  indicated  (five  iterations  between  control 
updates  with  .5  second  updating  intervals).  The  results 
are  that  the  aircraft  attempts  a straight-ahead  (no  bank) 
maximum  "g"  climb.  This  makes  sense  if  consideration  is 
given  to  the  speed  and  load  factor  advantage  of  the  missile. 

By  attempting  a straight-ahead  climb,  the  aircraft  tries  to 
capitalize  on  the  zero-thrust  situation  of  the  missile; 
however,  a kill  is  scored  in  all  situations  until  the 
missile  is  initially  positioned  far  enough  behind  the 
aircraft  so  that  its  airspeed  is  depleted  before  the  intercept 
can  be  completed. 

This  test  case  confirms  what  experience  has  proven 
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in  air-to-air  combat;  if  an  evader  is  unaware  of  an  attack 
upofi  hirii  and  Ls  not  maneuvering  at  tne  time  missile 
guidance  begins,  he  will  most  likely  be  destroyed. 

Case  2.  This  situation  is  initially  the  same  relative 
position  as  in  Case  1;  however,  the  initial  controls  are 
changed.  The  pursuer  again  employs  zero  bank  and  one  "g" 
but  the  evader  flies  a fixed  90°  bank  and  three  "g's" 
throughout.  The  DDP  algorithm  is  used  as  the  pursuer's 
guidance  law  and  the  evader  flies  the  initial  controls  with 
no  updating. 

The  first  application  used  an  integration  step  size  of 
.02  seconds,  an  updating  interval  of  1.5  seconds,  and  15 
iterations  between  updates.  The  result  is  a 237  foot  miss 
at  the  end  of  the  game. 

A second  attempt  with  DDP  guidance  with  a one  second 
updating  interval  and  ten  iterations  results  in  a 13  foot 
miss  at  f = 11.5°»  a kill. 

Finally,  a third  application  using  five  iterations  and 
a .5  second  updating  interval  gives  a four  foot  miss  at 
^ = 11.5°»  within  the  lethal  envelope. 

This  analysis  shows  that  an  improvement  is  realized  by 
taking  a smaller  updating  interval  and  retaining  enough 
iterations  (in  this  case  five)  to  derive  "near  optimal" 
controls  without  undue  computational  time.  In  the  large 
updating  interval  case,  the  "near-optimal"  controls  derived 
are  not  close  enough  to  the  optimal  and,  when  applied  for 
long  periods  (one  second  and  more),  cause  poor  results. 
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In  the  explanation  of  the  DDP  method  (Chapter  III) 

it  was  stated  that  the  predicted  cost  cna.nges,  ae(t0)  and 

a (tQ),  should  be  small.  In  the  1,5  second  updating 

application,  the  magnitudes  of  a.  and  a_  after  five  iterations 

e p 

are  on  the  order  of  50,000  and  decrease  to  5330  by  the  end  of 
the  game.  For  the  one  second  updating,  they  are  49,000  after 
five  iterations  and  decrease  to  200.  ’.Vhile  with  a .5  second 
interval,  the  predicted  changes  after  the  iterations  are 
32,000  but  decrease  to  10“^. 

It  can  be  concluded  that  forcing  the  evader  to  use  a 
particular  control  and  allowing  the  pursuer  to  update  with 
the  DDP  algorithm  on  short  time  intervals  will  result  in  a 
successful  intercept;  however,  this  is  not  achieved  in 
real-time,  even  with  the  CDC  6600  computer. 

Case  2..  In  this  simulation,  both  pursuer  and  evader 
use  the  DDP  algorithm  to  update  and  compute  control  histories. 
Although  a real-time  solution  is  not  achieved,  the  computations 
are  based  upon  an  integration  step  size  of  .02  seconds,  a 
•5  second  updating  period,  and  5 iterations  between  updates. 
Three  particular  situations  will  be  presented  in  detail  as 
representative. 

The  first  situation  is  that  of  Table  XI  of  Appendix  A. 

The  selected  initial  controls  place  the  aircraft  in  a 90° 
bank,  two  "g"  turn  into  the  missile.  The  pursuer's  initial 
control  is  a zero  bank,  one  "g"  path  aimed  ahead  of  the 
aircraft.  This  control  was  selected  to  reflect  the  fact 
that  the  pursuer  does  not  have  any  idea  what  the  aircraft 
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will  do.  It  is  very  similar  to  the  way  a hunter  would  aim 
at  a duck,  some  slight  lead  based  on  the  present  control  of 
the  evader,  and  does  not  Live  any  innerent  advantage  to 
the  missile. 

The  missile  closes  to  within  15  feet  at  the  intercept 
point  with  f = 40°  and  the  resulting  flight  path  is 
depicted  in  Figure  4.  Table  I lists  the  relative  values 
of  actual  versus  predicted  cost  cnanges, 

The  individual  predicted  cost  changes  start  at  magnitudes 
on  the  order  of  100,000  and  slowly  decrease  with  successive 
iterations  until  they  reach  .5  at  the  end  of  the  game. 


Table  I 

Convergence  Characteristics 
Time  (Secs)  Predi 


) Predicted  Final 
Miss  Distance  (ft) 


.5 

.13 

646 

1.0 

.85 

492 

1.5 

.91 

288 

2.0 

.85 

129 

2.5 

.77 

50 

3.0 

.67 

19 

3.5 

.71 

15 
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A second  data  set.  Table  XII,  is  run  in  the  sane  manner 
but  the  initial  evader  control  is  now  90°  bank  and  two  "g's" 
away  from  the  pursuer.  This  problem  terminates  in  an  11  foot 
miss  at  ^ = 40°.  Table  II  depicts  the  convergence 
characteristics  of  this  problem.  The  values  of  ae(tQ) 
and  ap(tQ)  start  at  42,000  and  decrease  to  10“^.  This 
trajectory  is  shown  in  Figure  6. 


Table  II 

Convergence  Characteristics 


e (Secs) 

Predicted 
Miss  Disti 

• 5 

.42 

663 

1.0 

.91 

544 

1.5 

.92 

305 

2.0 

.86 

142 

2.5 

.79 

61 

3.0 

.68 

29 

3.5 

.96 

16 

4.0 

.67 

12 
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Finally,  Table  XIII  is  used  to  begin  an  intercept 
problem.  In  this  case,  the  aircraft  is  in  a 30°  dive 
using  90°  bank  and  two  "g’s"  into  the  pursuer.  This  helps 
the  aircraft  since  it  is  gaining  both  airspeed  and  control 
authority,  and  the  miss  distance  is  578  feet  with  the 
convergence  characteristics  presented  in  Table  III.  The 
individual  cost  change  values  begin  at  350,300  and  decrease 
to  10’1.  The  trajectory  is  depicted  in  Figure  8. 


Table  III 


Convergence  Characteristics 


Time  (Secs) 

A3/i.c« 

Predicted  Final 
Miss  Distances 

.5 

1.54 

1227 

1.0 

1.00 

146? 

1.5 

1.11 

1483 

2.0 

1.25 

1489 

2.5 

.95 

1487 

3.0 

.99 

1468 

3.5 

.99 

1413 

4.0 

.99 

1295 

4.5 

.99 

1077 

5.0 

.90 

760 

5.5 

. 

Oo 

^3 

613 

6.0 

.70 

581 

6.5 

.54 

579 

57 


The  bank  angle  histories  for  the  three  cases  are  in 
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Tables  IV,  V,  and  VI. 


Time  (Secs) 

Table  IV 

Bank  Angle  History 
Pursuer  Bank  (Deg) 

Evader  Bank  (Deg) 

.5 

42 

32 

1.0 

81 

82 

1.5 

74 

64 

2.0 

82 

75 

2.5 

86 

78 

3.0 

81 

84 

3.5 

84 

86 

Time  (Secs) 

Table  V 

Bank  Angle  History 
Pursuer  Bank  (Deg) 

Evader  Band  (Deg) 

.5 

41 

50 

1.0 

86 

81 

1.5 

78 

70 

2.0 

81 

74 

2.5 

83 

75 

3.0 

85 

77 

3.5 

86 

76 

4.0 

87 

77 

S 
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Table  VI 


Bank  An.-  le  History 

Time  (Secs)  Pursuer  Bank  (Deg)  Evader  Bank  (Deg) 


.5 

76 

82 

1.0 

55 

30 

1.5 

52 

25 

2.0 

51 

24 

2.5 

50 

22 

3.0 

50 

22 

3.5 

51 

22 

4.0 

51 

25 

4.5 

54 

32 

5.0 

58 

44 

5.5 

62 

55 

6.0 

65 

62 

6.5 

66 

63 

It  can 

be  seen 

that 

there  are  no 

significant 

discontinuities  in 

the  updating  of  the  controls;  however, 

it  is  also  apparent 

that 

the  controls 

remain  nonoptimal 

for  a good 

portion 

of  the 

intercept. 

If 

more  iterations 

are  used  between  control  updates,  better 

controls  would  be 

derived  but 

at  the 

expense  of  a real- 

time 

implementation. 

In  general,  five  iterations  is  not  sufficient  for  determining 
near-optimal  controls  when  initial  control  "guesses  " are 
not  close  to  the  optimal.  ' 

The  final  stage  of  the  intercept  problem  is  the  most 
sensitive  and  rapid,  large  control  changes  are  demanded  of 


the  missile  reflecting  the  missile's  desire  to  line  up  the 
flight  paths  (null  ^ ) and  eliminate  the  terminal  miss  distance 
while  state  values  are  rapidly  changing.  To  overcome  this 
situation,  anytime  the  time  remaining  is  less  than  one 
full  updating  period,  the  missile  uses  the  last  control 
selected. 

The  load  factor  selected  by  the  guidance  scheme  was 

largely  determined  by  the  renalty  values,  P and  P 

e2  p* 

Interior  controls  result  for  the  missile  for  a large  portion 

of  the  flight  while  the  aircraft  normally  reaches  its 

maximum  load  factor  first.  The  values  of  P^  and  P„  must 

e p 

be  picked  large  enough  to  ensure  convergence;  however, 
large  values  also  cause  slow  convergence  as  reflected  in  the 
values  of  ae(tQ)  and  ap(t0)  previously  discussed. 

Proportional  Navigation  Guided  Pursuer 

The  pursuit-evasion  problem  was  re-solved  in  a closed- 
loop  fashion  with  the  evader  employing  DDP  derived  controls 
and  the  pursuer  relying  upon  proportional  navigation  to 
compare  the  DDP  performance.  The  same  initial  states  and 
controls  were  used  (Tables  XI  - XXII).  The  proportionality 
constants,  RP1  and  RP2,  are  set  at  ten.  This  represents 
a performance  level  which  exceeds  that  obtainable  with 
current  technology;  however,  the  selection  of  ten  is  used 
to  reflect  future  capabilities  of  improved  proportional 
navigation  methods. 

Each  proportional  navigation  trajectory  follows  the 
DDP  trajectory  in  Appendix  A.  In  each  situation,  the 
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missile  is  a le  to  close  to  lethal  range.  The  aircraft  once 
again  attempts  to  cross  tne  pursuer's  path,  but  the  missile 
is  able  to  keep  inside  of  the  turn.  This  is  due  to  the 
selection  of  ten  as  a proportionality  constant  and  the 
fact  that  the  evader  is  using  the  DDP  guidance  law  which  is 
not  optimal. 

Both  combatants  use  the  maximum  load  factor  throughout 
which  allows  the  missile  more  control  authority,  but  at  a 
great  expense  in  drag.  The  more  unfavorable  the  initial 
position,  the  larger  the  final  range.  This  situation  is 
expected  and  reflects  missile  launches  made  from  outside 
of  the  "firing  envelope"  where  the  drag  penalty  defeats 
the  missile. 

Tables  VII,  VIII,  and  IX  present  the  bank  angles 
resulting  from  the  conditions  of  Tables  XI,  XII,  and  XIII 
with  the  trajectories  depicted  in  Figures  5»  7,  and  9. 


Time  (Secs) 


Table  VII 

Bank  Angle  History 
Pursuer  Bank  (Deg) 


Evader  Bank  (Deg) 


.5 

1 82 

87 

1.0 

72 

87 

1.5 

184 

89 

2.0 

7 

89 

2.5 

180 

88 

3.0 

178 

89 

3.5 

13 

90 

4.0 

154 

90 
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Table  VIII 


( 


Time  (Secs) 

Bank  Angle  History 
Pursuer  Bank  (Deg) 

Evader  Bank 

.5 

-61 

-59 

1.0 

-21 

-59 

1.5 

136 

-59 

2.0 

-40 

-59 

2.5 

-44 

-59 

3.0 

143 

-60 

3.5 

130 

-60 

4.0 

-71 

-60 

4.5 

-63 

-59 

Table  IX 

Bank  Angle  History 

Time  (Secs)  Pursuer  Bank  (Deg)  Evader  Bank  (Deg) 


.5 

-51 

20 

1.0 

88 

14 

1.5 

26 

18 

2.0 

19 

16 

2.5 

52 

19 

3.0 

61 

22 

3.5 

96 

29 

4.0 

104 

40 

4.5 

100 

37 

5.0 

110 

39 

5.5 

112 

46 

6.0 

53 

28 

42 


Ooen-Loon  Com  arison 


The  exact  TPEVP  solution  for  three  intercept  problems, 
yielding  open-loop  controls,  were  obtained  (Ref  11).  These 
situations  are  listed  in  Tables  XX  - XXII.  The  two 
guidance  schemes  previously  discussed  (DDP  and  proportional 
navigation)  were  applied  to  tne  problems;  however,  the 
initial  controls  for  both  players  are  the  saddle-point 
controls  from  the  TPBVP  solution.  Again,  five  iterations 
between  .5  second  updates  are  used  with  the  resulting 
trajectories  displayed  in  Figures  22  - 30. 

For  the  situation  of  Table  XX,  the  open-loop  controls 

give  a terminal  miss  of  20  feet  while  the  DDP  method  results 

in  a 177  foot  miss  and  the  proportional  navigation  scheme 

results  in  a 126  foot  miss.  The  control  comparison  is 

listed  in  Table  X.  The  predicted  cost  changes,  ae(tQ)  and 

a (t  ),  start  at  240,000  and  decrease  to  60. 
p o 

From  this  analysis,  it  is  determined  that  the  DDP 
method  can  converge  to  a near-optimal  solution  if  nominal 
controls  are  selected  which  are  very  close  to  the  optimal 
since  the  trajectories  (Figures  22  - 30)  are  very  close 
in  all  cases.  In  actual  applications  this  is  very 
difficult  to  do  since  there  is  no  sound  basis  for  arriving 
at  good  initial  control  guesses  without  long  computer 
solutions  (Ref  11). 
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Table  X 


Time 

(Secs) 


Control  Comparison 


Optimal  DDP 

Pursuer  (Deg)  Pursuer  (Deg) 


Prop  Nav 
Pursuer  (Deg) 


.5 

25 

21 

16 

1.0 

25 

22 

19 

1.5 

25 

25 

22 

2.0 

25 

25 

25 

2.5 

25 

24 

28 

5.0 

25 

24 

51 

5.5 

29 

25 

54 

4.0 

55 

22 

58 

4.5 

59 

20 

44 

5.0 

45 

20 

55 

5.5 

55 

29 

72 

6.0 

75 

- 
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VI.  c onclusions  and  Recommendations 


Conclusions 

The  DDP  method  has  been  applied  to  a non-linear, 
differential  game  modeled  air-to-air  intercept  problem. 

A variety  of  initial  positions  has  been  examined  as  indicated 
in  Chapter  V and  Appendix  A,  and  a real-time,  closed-loop 
implementation  of  the  guidance  lav/  was  not  realized.  The 
trajectory  deviations  which  arise  from  large  computed 
control  changes,  4u  and  4 v,  are  too  large  and  force  the 
adoption  of  the  CCP  technique.  Although  this  assures 
convergence,  the  computation  time  is  not  suitable  for 
real-time  applications. 

The  restriction  to  five  iterations  does  not  allow 
near-optimal  controls  to  be  found  due  to  the  slow  convergence 
characteristics  as  evidenced  by  the  large  values  of  ae(tQ) 
and  ap(tQ)  given  in  Chapter  V.  It  is  concluded  that  any 
real-time  guidance  algorithm  (for  a short  duration  air-to-air 
missile)  v/hich  hopes  to  achieve  an  increase  in  kill 
probability  through  the  use  of  differential  game  theory  will 
have  to  sacrifice  some  of  the  realism  involved  through  the 
inclusion  of  nonlinear  dynamics. 

The  algorithm  demonstrates  a strong  reliance  on  the 
initial  controls.  The  closer  the  initial  controls  to  the 
optimal,  the  better  the  DDP  metnod  solutions  (more  near- 
optimal).  The  possibility  of  storing  several  near-optimal 
control  histories  aboard  a small  missile  is  doubtful,  and 
merely  guessing  one  does  not  lead  to  good  results. 
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The  tactics  which  arise  through  this  simulation  in  a 
closed-loop  manner  are  very  similar  to  those  which  are  used 
in  air-to-air  combat  situations.  The  use  of  DDF,  therefore, 
may  be  worthwhile  in  an  off-line  computer  simulation  to 
test  selected  tactics. 

Finally,  it  is  concluded  that  although  the  traditional 
iterative  (neighboring  extremal)  TPBVP  solution  requirement  of 
needing  very  good  "guesses"  for  the  initial  co-states  is 
eliminated  with  the  DDF  method,  a strong  dependence  on 
initial  conditions  remains.  The  solution  is  very 
problem-dependent  in  that  in  high  altitude  intercepts 
(above  30»000  feet)  the  aircraft  is  penalized  due  to  lower 
thrust  and  a reduction  in  the  maximum  load  factor.  At 
lower  altitudes,  the  aircraft’s  improved  performance  and 
the  nonoptimality  resulting  from  the  five  iteration 
restriction  allow  the  aircraft  to  defeat  the  missile  as 
shown  in  Figures  10  and  18. 

It  has  been  determined  (Chapter  V)  that  the  DDP  method 
applied  to  the  aircraft  improves  the  evasion  (increases  the 
final  range)  over  the  selection  of  a particular  selected 
strategy  which  employs  no  updating  method.  The  final 
set  of  trajectories,  Figures  22  - 30,  show  that  the  solution 
will  converge  to  the  optimal,  given  sufficient  time.  This 
"long-time"  requirement  precludes  the  real-time  implementation. 

Recommendations 

It  is  recommended  that  further  study  in  this  area  be 
directed  toward  simplification  of  the  dynamics.  A 
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reduction  in  the  complexity  of  the  models,  while  not 
completely  relinquishing  the  basic  characteristics,  may 
allow  some  of  the  potential  gains  inherent  in  the 
differential  game  philosophy  to  be  achieved. 

Beyond  the  model  simplification,  an  extension  of  the 
intercept  problem  to  include  the  case  of  multiple  missile 
launches  may  provide  some  useful  tactical  information. 
Presently,  the  results  confirm  accepted  logic  for  air-to-air 
counter-maneuvers.  Ferhaps  a new  or  revised  tactic  may 
be  discovered  through  simulation  of  different  encounters. 
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Appendix  A 
Trajectory  Analysis 

The  following  graphs  depict  the  three-dimensional 
flight  paths  and  ground  tracks  of  the  missile  and  aircraft 
resulting  from  a closed-loop  application  of  the  guidance 
scheme.  In  Figures  4 - 21 , the  aircraft  uses  the  DDP 
method  while  the  missile  uses  either  the  DDP  algorithm  or 
proportional  navigation  as  indicated  on  each  graph.  Figures 
22  - 30  are  the  exact  TPBVP  solution  comparisons  with 
Figures  22,  25  and  28  depicting  the  ’’exact"  optimal 
solutions.  The  initial  conditions  for  each  set  of  graphs 
are  given  in  the  preceding  tables,  Tables  XI  - XXII. 


49 


( 


Table  XI 


Initial  Conditions 


State 

Evader 

Pursuer 

x ft 

5000 

-1000 

y ft 

5000 

6000 

z ft 

33160 

33160 

v ft/sec 

706 

2219 

/ rads 

0.0 

-.01 

r rads 

.524 

.01 

Control 

* g's 

2 

1 

yu  rad 

1.57 

0.0 

t 
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DDP  EVRSION-DDP  PURSUIT 


DDP  EVASION-PROP  NAV  PURSUIT 


Table  XII 


Initial  Conditions 


State 

Evader 

Pursuer 

x ft 

10000 

WO 

y ft 

10000 

9000 

z ft 

33160 

33160 

v ft/sec 

706 

2219 

Jf  rads 

0.0 

-.01 

*■  rads 

-.524 

c^- 

0 
o 

• 

1 

Control 

t 

X g's 

2 

1 

/»,  rad 

1.57 

0.0 

C 


Table  XIII 


Initial  Conditions 


State 

Evader 

Pursuer 

x ft 

10000 

2000 

y ft 

10000 

9000 

z ft 

33000 

34000 

v ft/sec 

706 

2219 

Tf  rads 

-.524 

-.5 

r rads 

-.524 

.01 

Control 


DDP  EVnSION-DDP  PURSUIT 


LEGEND 
□ - EVADER 
o - PURSUER 
a-GND  TRK(E) 
+ -6ND  TRK(P) 


Table  XIV 


Initial  Conditions 


State 

Evader 

Pursuer 

x ft 

5000 

0.0 

y ft 

5000 

6000 

z ft 

15000 

15000 

v ft/sec 

770 

2065 

V rads 

-.1 

-.02 

o-  rads 

.524 

.01 

Control 

n g's 

2 

1 

yU  rads 

1.05 

0.0 

Table  XV 


Initial  Conditions 


State 

Evader 

Pursuer 

x ft 

10000 

12000 

y ft 

10000 

2000 

2 ft 

33160 

33160 

v ft/sec 

706 

221 9 

V rads 

0.0 

-.01 

o'  rads 

0.0 

1.57 

Control 

n g's 

2 

1 

yu  rads 

1.57 

0.0 

62 


DDP  EVRSION-DDP  PURSUIT 


Table  XVI 


Initial  Conditions 


State 

Evader 

Pursuer 

x ft 

10000 

4500 

y ft 

10000 

11000 

z ft 

15000 

15000 

v ft/sec 

770 

2065 

f rads 

.1 

.11 

r rads 

.524 

.01 

Control 

n g's 

3 

1 

yU.  rads 

.524 

0.0 

DDP  EVASION-PROP  NAV  PURSUIT 


C 


6' 


Table  XVII 


Initial  Conditions 


State 

Evader 

Pursuer 

x ft 

10000 

3000 

y ft 

10000 

9000 

z ft 

33160 

33160 

v ft/sec 

706 

2219 

y rads 

0.0 

-.01 

<r  rads 

-.524 

-.001 

Control 

n g’s 

2 

1 

yU  rads 

1.57 

0.0 

C 
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Table  XVIII 


Initial  Conditions 


State 

Evader 

Pursuer 

x ft 

10000 

5000 

y ft 

10000 

9000 

z ft 

11000 

11000 

v ft/sec 

850 

2350 

rads 

0.0 

-.01 

or  rads 

-.524 

-.01 

Control 

n g's 

2 

1 

JU  rads 

1.57 

0.0 

DDP  EVRSION-DDP  PURSUIT 
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Table  XIX 


Initial  Conditions 


State 

Evader 

Pursuer 

x ft 

10000 

4000 

y ft 

1000G 

11000 

z ft 

33160 

33160 

v ft/sec 

706 

2219 

rads 

0.0 

-.01 

cr  rads 

.524 

.01 

Control 

n g*s 

3 

1 

ytt  rads 

1.05 

0.0 

L 
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Table  XX 


1 


Initial  Conditions 


State 

Evader 

Pursuer 

x ft 

4178 

10920 

y ft 

4435 

10242 

z ft 

11123 

7762 

v ft/sec 

849 

2491 

t rads 

-.5 

-.29 

<r  rads 

-1.2 

-2.18 

Control 

n g's 

6 

15 

yu  rads 

1 .05 

' .524 

C 


77 


OPT  EVflSION-OPT  PURSUIT 


Fig.  22 


Table  XXI 


* 


Initial  Conditions 


State 

Evader 

Pursuer 

x ft 

4936 

9341 

y ft 

3013 

10358 

2 ft 

33198 

33326 

v ft/sec 

703 

2235 

y rads 

-.84 

-.62 

C rads 

- .64 

-2.43 

Control 

n g's 

2 

15 

ym  rads 

0.0 

1.22 

( 


81 


31000  31500 


DDP  EVASION-PROP  NRV  PURSUIT 


Table  XXII 


Initial  Conditions 


State 

Evader 

Pursuer 

x ft 

4972 

9199 

y ft 

2985 

10260 

2 ft 

33150 

33170 

v ft/sec 

706 

2219 

V rads 

-.846 

-.619 

r rads 

-.632 

-2.41 

Control 

n g's 

2 

15 

rads 

0.0 

1 .22 

85 
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Appendix  B 


Application  of  the  Algorithm 
The  following  discussion  explains  the  application  of 
the  DDP  algorithm  in  a closed-loon  manner: 

(a)  Nominal  controls  are  used  in  the  forward 
integration  of  Eq  (2-1)  as  previously  indicated.  Penalty 
values  are  selected  prior  to  the  integration  and  the 
mechanization  discussed  on  page  11  is  carried  out. 

(b)  The  predicted  and  actual  cost  changes  are 
compared  as  explained  in  Appendix  D and  the  penalty  values 
are  adjusted.  If  regions  A or  A'  are  encountered,  the 
iteration  is  rejected  and  the  nominal  controls  are  used 
again  with  an  adjusted  penalty  value.  If  the  comparison 
falls  outside  regions  A or  A1,  the  nominal  control  is 
replaced  by  the  control  obtained  during  the  backward 
integration  and  the  penalties  are  adjusted. 

(c)  The  controls  derived  during  the  backward 
integration  ire  exchanged  for  the  controls  used  during  the 
forward  integration  as  long  as  the  cost  ratios  satisfy  the 
rules  of  Appendix  D.  The  iterations  are  continued  for  some 
predetermined  number. 

(d)  After  the  required  number  of  iterations  is 
reached,  the  resulting  controls  are  applied  for  a specified 
time  interval.  The  state  of  each  combatant  at  the  end  of  the 
time  interval  becomes  the  initial  condition  for  the  next 
Iteration  cycle,  and  the  process  is  repeated. 


(e)  The  intercept  is  completed  when  the  derivative 
of  the  cost  goes  to  zero.  The  state  at  each  updating  point 
is  used  to  obtain  the  trajectories  of  Appendix  A.  The  evader 
and  pursuer  are  able  to  base  the  choice  of  controls  on  the 
more  current,  updated  state,  thereby  giving  closed-loop 
guidance. 
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Appendix  C 


Derivation  of  the  DPP  Equations 


It  is  assumed  that  nominal  controls,  u(t)  and  v(t), 
exist  v/hich  result  in  a nominal  trajectory  to  Eq  (2-1), 
x(t).  It  is  further  assumed  that  this  trajectory  is 
reasonably  close  to  the  optimal  solution,  x*(t).  If  the 
optimal  solution  is  written  as 


X*(t)  =■  x(t)  4 AX  (0  (C-1* 

Eq  (3-1)  can  be  written  in  terms  of  the  nominal  as  follows: 

iL*  (x  ♦ ax  i t)  ♦ T T [H  (*+Ax>  0 

*■  (C— 2) 

4^TC(X+AX,  U#  V;  t)j  =■  0 


Equation  (C-2)  is  expanded  in  AX  , resulting  in 

+-^-J.‘(x;0  ax 

4 * (£)*« * (c-3) 

+ VT^  AX  + [£ axJ  f (x  ,u,v J 0 4 R']  : 0 

If  AX  is  small,  the  remainder  term,  R',  can  be  neglected 
since  it  represents  terms  of  second  order  and  higher.  Since 
Eq  (C-3)  is  not  dependent  upon  choice  of  AX  , the 
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following  relationships  hold: 


ill  ♦ »(*,«•  vvf  ;t)  - o 

jLlJ  + g.(5,u»  v«,  j-jt)  ♦jftf  (?,*>*  l)  w- 

*[»T&(  !.<VjtjT=  o 

T • 

Employing  expressions  for  the  total  derivatives  of  J 


4) 


and  T 


-i-j°  : ii‘  4 J*Tf 

at  »t 

_Lj*  = ,i_  j*  ♦ i-  j*  f 
aT  * at  J«  ix  ■ 


(C-5) 


and  Eq  (3-4) 


= J*-  y 


(C-6) 


in  Eqs  (C-4) , the  DDP  equations  are  determined  as 
(Ref  8:3-5): 

.i,W- H(s,<v»it)-H(*,*.*jO 

- j;  = jf  J«*i  o * K&] 


(C-7) 
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Appendix  D 


( 


( 


Adjustment  of  the  Penalty  functions 

To  derive  the  most  benefit  from  the  CCP  technique,  the 

penalty  function  values  must  be  altered  based  upon  the 

reaction  of  the  problem  to  the  present  values.  When 

convergence  is  indicated  the  values  should  be  decreased. 

The  following  discussion  forms  a basis  for  the  alteration 

of  the  penalties.  The  following  figure  is  helpful  in 

determining  how  to  adjust  and  P (Ref  9:30). 

e P 


Fig.  31  Convergence  Domain 
If  the  ratio  of  the  actual  cost  change,  4 J , to  the 
predicted  cost  change,  a(tQ),  falls  within: 


93 


Area  A - The  expansion  of  Eq  (3-1 ) is  not  valid 

( ax  is  too  large)  and  the  penalty  values 
should  be  increased  significantly.  This 
indicates  that  the  pursuer's  penalty  is 
dominant  and  should  be  increased  more  than 
the  evader's. 

Area  B - The  expansion  is  poorly  satisfied  (pursuer's 
penalty  still  dominates)  and  an  increase  in 

P„  is  indicated. 

P 

Area  C - The  expansion  is  satisfied  and  both  penalty 
values  should  be  decreased. 

Area  D - This  is  similar  to  Area  B but  the  evader's 
penalty,  P , should  be  increased  in  this 

V 

case. 

For  points  within  Areas  A'  - D',  the  rules  pertaining 
to  Areas  A - D apply  with  the  evader  and  pursuer  philosophies 
reversed. 

It  is  wise  to  try  to  keep  the  components  of  a(tQ), 
ae(t0)  and  ap(tQ),  of  the  same  order  of  magnitude  by 
varying  the  ratio  of  Pp  to  Pg  (Ref  8:31 )• 

One  area  which  represents  a "special  case"  is  the 
region  close  to  the  origin  in  Areas  A and  A'.  In  these 
areas,  the  predicted  cost  change  is  small;  however,  the 
components,  a (t  ) and  a (t  ),  may  be  large.  In  this 

6 0 p O 

situation,  both  penalties  may  be  reduced  moderately. 
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Appendix  E 


Numerical  Aspects 

This  Appendix  presents  the  numerical  aspects  of  the 
problem.  The  values  of  the  constants  used  are  presented 
to  aid  further  research  in  this  area. 

Aircraft  Equations 

Tax  (lb*)  = (l234 fc.7  - .7048  y + '«•'«'  v) 

CD  - . OMoir  + .Z13  C 

= .IW  ♦ (.OHlA-.3o<x/c\)(v-*S’o) 
ml  r .ZSUS  CL 

s(ft‘)  = *3  0 

10  (Its)  = 40,0  0 0 

Missile  Equations  and  General  Constants 


C6  : .^  + .041  C* 

s(ft‘)  = «3 

U (Ik.)  = '03 


.00237fc^  T^r 

-f  r -• 

3 .S*i*  H 


1.2  fi!  J 
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